US Census Data

Lab Assignment Four: Multi-Layer Perceptron


Richmond Aisabor

Data Understanding

Data Description

Encode objects

Class Discretization

The box plot shows a normal distribution that is left skewed, since the median child povery rate is 16.3%. The minimum child povery rate is 0% and the maximum is 100%. At the first and second quartiles, the child poverty rate is 6.2% and 31.6%, respectively. This information helps classify the dataset even further and divide the quality values into 4 ranges. The census samples will be divided into four classes:

The child poverty distribution plot shows that each child poverty class has about the same number of observations, so the dataset is balanced. The method chosen to balance the dataset was using a quantization threshold to divide the child poverty data into four classes. The quanitzation threshold was derived by finding the dynamic ranges between each quartile of the child poverty variable.

The training dataset should be balanced to prevent adding a bias in favor of the class with the most instances to the model. The traiing dataset should also be balanced because there is equal interest in the classification performance of each class in the dataset. The advantage of using a test set that has a similar distribution as the training dataset is that it gives a more realisitic performance estimation of how the model will perform in production.

Training and Testing

Pre-Processing

Two Layer Perceptron

Normalized input

One hot encode categorical data

Compare performances of the trained models

After experimenting with the three models, each show an accuracy score of about 0.25. The meaninful difference in the performance is the direction of data loss. For the normalized dataset, the performance seemed got better over the twenty epochs. On the normalized, one-hot encoded and unedited datasets, the performance got worse over the 20 epochs.

Modeling

Three layer Perceptron

Four Layer Perceptron

Five Layer Perceptron